In [1]:
%run dataFormating.ipynb


rmdfTestUsers read_csv success (1/3)
rmdf1522 read_csv success (2/3)
rmdf160 read_csv success (3/3)
gform read_csv success
gformFR read_csv success
temporalities set (user answer method)
profile info set
dataFormating.ipynb:16: FutureWarning: pd.TimeGrouper is deprecated and will be removed; Please use pd.Grouper(freq=...)
  "metadata": {},
dataFormating.ipynb:16: FutureWarning: using a dict on a Series for aggregation
is deprecated and will be removed in a future version
  "metadata": {},

What subsets of scientific questions tend to be answered correctly by the same subjects?

Mining


In [2]:
from orangecontrib.associate.fpgrowth import *  
import pandas as pd
from numpy import *

In [3]:
questions = correctedScientific.columns
correctedScientificText = [[] for _ in range(correctedScientific.shape[0])]
for q in questions:
    for index in range(correctedScientific.shape[0]):
        r = correctedScientific.index[index]
        if correctedScientific.loc[r, q]:
            correctedScientificText[index].append(q)
#correctedScientificText

In [4]:
len(correctedScientificText)


Out[4]:
252

In [5]:
# Get frequent itemsets with support > 25%
# run time < 1 min
support = 0.20
itemsets = frequent_itemsets(correctedScientificText, math.floor(len(correctedScientificText) * support))
#dict(itemsets)

In [6]:
# Generate rules according to confidence, confidence > 85 %
# run time < 5 min
confidence = 0.80
rules = association_rules(dict(itemsets), confidence)
#list(rules)

In [7]:
# Transform rules generator into a Dataframe
rulesDataframe = pd.DataFrame([(ant, cons, supp, conf) for ant, cons, supp, conf in rules])
rulesDataframe.rename(columns = {0:"antecedants", 1:"consequents", 2:"support", 3:"confidence"}, inplace=True)
rulesDataframe.head()


Out[7]:
antecedants consequents support confidence
0 (QDeviceRbsPconsAmprTer, QDeviceGfpRbsPconsTer... (QDeviceRbsPconsFlhdcTer) 52 1.000000
1 (QDeviceGfpRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) 52 0.962963
2 (QGenotypePhenotype, QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTer) 52 0.866667
3 (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprT... (QDeviceGfpRbsPconsTer) 52 0.881356
4 (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTe... (QGenotypePhenotype) 52 0.896552

In [8]:
# Save the mined rules to file
rulesDataframe.to_csv("results/associationRulesMiningSupport"+str(support)+"percentsConfidence"+str(confidence)+"percents.csv")

Search for interesting rules

Interesting rules are more likely to be the ones with highest confidence, the highest lift or with a bigger consequent set. Pairs can also be especially interesting


In [9]:
# Sort rules by confidence
confidenceSortedRules = rulesDataframe.sort_values(by = ["confidence", "support"], ascending=[False, False])
confidenceSortedRules.head(50)


Out[9]:
antecedants consequents support confidence
100 (QDeviceGfpRbsPconsTer) (QDeviceRbsPconsFlhdcTer) 62 1.000000
61 (QDeviceGfpRbsPconsTer, QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer) 58 1.000000
79 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer) 58 1.000000
71 (QDeviceAmprRbsPconsTer, QDeviceGfpRbsPconsTer) (QDeviceRbsPconsFlhdcTer) 56 1.000000
11 (QDeviceAmprRbsPconsTer, QDeviceGfpRbsPconsTer... (QDeviceRbsPconsFlhdcTer) 55 1.000000
50 (QDeviceGfpRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsFlhdcTer) 54 1.000000
0 (QDeviceRbsPconsAmprTer, QDeviceGfpRbsPconsTer... (QDeviceRbsPconsFlhdcTer) 52 1.000000
24 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsAmprTe... (QDeviceRbsPconsFlhdcTer) 52 1.000000
26 (QDeviceAmprRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) 52 1.000000
32 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcT... (QDeviceRbsPconsAmprTer) 52 1.000000
75 (QDeviceAmprRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsFlhdcTer) 52 1.000000
92 (QDeviceAmprRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsAmprTer) 52 1.000000
48 (QDeviceRbsPconsAmprTer, QAmpicillin) (QDeviceRbsPconsFlhdcTer) 51 1.000000
53 (QDeviceGfpRbsPconsTer, QGreenFluorescence) (QDeviceRbsPconsFlhdcTer) 51 1.000000
66 (QDeviceGfpRbsPconsTer, QAmpicillin) (QDeviceRbsPconsFlhdcTer) 50 1.000000
97 (QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer) 65 0.984848
42 (QDeviceRbsPconsAmprTer, QGenotypePhenotype) (QDeviceRbsPconsFlhdcTer) 59 0.983333
82 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcTer) (QDeviceRbsPconsAmprTer) 58 0.983051
15 (QDeviceAmprRbsPconsTer, QDeviceGfpRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) 55 0.982143
20 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcT... (QDeviceRbsPconsAmprTer) 55 0.982143
86 (QDeviceAmprRbsPconsTer, QDeviceGfpRbsPconsTer) (QDeviceRbsPconsAmprTer) 55 0.982143
46 (QGreenFluorescence, QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer) 50 0.980392
105 (QDeviceAmprRbsPconsTer) (QDeviceRbsPconsFlhdcTer) 59 0.967213
1 (QDeviceGfpRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) 52 0.962963
9 (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTe... (QDeviceRbsPconsAmprTer) 52 0.962963
60 (QDeviceGfpRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsAmprTer) 52 0.962963
81 (QDeviceAmprRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) 58 0.950820
110 (QDeviceAmprRbsPconsTer) (QDeviceRbsPconsAmprTer) 58 0.950820
74 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcTer) (QDeviceGfpRbsPconsTer) 56 0.949153
10 (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTe... (QDeviceAmprRbsPconsTer) 55 0.948276
12 (QDeviceGfpRbsPconsTer, QDeviceRbsPconsAmprTer) (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcTer) 55 0.948276
13 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTer) 55 0.948276
18 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcT... (QDeviceGfpRbsPconsTer) 55 0.948276
83 (QDeviceGfpRbsPconsTer, QDeviceRbsPconsAmprTer) (QDeviceAmprRbsPconsTer) 55 0.948276
84 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsAmprTer) (QDeviceGfpRbsPconsTer) 55 0.948276
63 (QDeviceGfpRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) 58 0.935484
65 (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTer) (QDeviceRbsPconsAmprTer) 58 0.935484
103 (QDeviceGfpRbsPconsTer) (QDeviceRbsPconsAmprTer) 58 0.935484
22 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcTer) (QDeviceGfpRbsPconsTer, QDeviceRbsPconsAmprTer) 55 0.932203
112 (QBBFunctionTER) (QGenotypePhenotype) 50 0.925926
73 (QDeviceAmprRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTer) 56 0.918033
107 (QDeviceAmprRbsPconsTer) (QDeviceGfpRbsPconsTer) 56 0.918033
98 (QDeviceRbsPconsAmprTer) (QGenotypePhenotype) 60 0.909091
96 (QBioBricksDevicesComposition) (QGenotypePhenotype) 69 0.907895
43 (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) (QGenotypePhenotype) 59 0.907692
70 (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTer) (QDeviceAmprRbsPconsTer) 56 0.903226
72 (QDeviceGfpRbsPconsTer) (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcTer) 56 0.903226
106 (QDeviceGfpRbsPconsTer) (QDeviceAmprRbsPconsTer) 56 0.903226
17 (QDeviceAmprRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTe... 55 0.901639
88 (QDeviceAmprRbsPconsTer) (QDeviceGfpRbsPconsTer, QDeviceRbsPconsAmprTer) 55 0.901639

In [10]:
# Sort rules by size of consequent set
rulesDataframe["consequentSize"] = rulesDataframe["consequents"].apply(lambda x: len(x))
consequentSortedRules = rulesDataframe.sort_values(by = ["consequentSize", "confidence", "support"], ascending=[False, False, False])
consequentSortedRules.head(50)


Out[10]:
antecedants consequents support confidence consequentSize
17 (QDeviceAmprRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTe... 55 0.901639 3
16 (QDeviceGfpRbsPconsTer) (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcT... 55 0.887097 3
30 (QDeviceAmprRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprT... 52 0.852459 3
6 (QDeviceGfpRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprT... 52 0.838710 3
14 (QDeviceRbsPconsAmprTer) (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcT... 55 0.833333 3
26 (QDeviceAmprRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) 52 1.000000 2
15 (QDeviceAmprRbsPconsTer, QDeviceGfpRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) 55 0.982143 2
1 (QDeviceGfpRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) 52 0.962963 2
81 (QDeviceAmprRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) 58 0.950820 2
12 (QDeviceGfpRbsPconsTer, QDeviceRbsPconsAmprTer) (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcTer) 55 0.948276 2
13 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTer) 55 0.948276 2
63 (QDeviceGfpRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) 58 0.935484 2
22 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcTer) (QDeviceGfpRbsPconsTer, QDeviceRbsPconsAmprTer) 55 0.932203 2
73 (QDeviceAmprRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTer) 56 0.918033 2
72 (QDeviceGfpRbsPconsTer) (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcTer) 56 0.903226 2
88 (QDeviceAmprRbsPconsTer) (QDeviceGfpRbsPconsTer, QDeviceRbsPconsAmprTer) 55 0.901639 2
5 (QDeviceGfpRbsPconsTer, QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer, QGenotypePhenotype) 52 0.896552 2
29 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer, QGenotypePhenotype) 52 0.896552 2
44 (QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer, QGenotypePhenotype) 59 0.893939 2
21 (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTer) (QDeviceAmprRbsPconsTer, QDeviceRbsPconsAmprTer) 55 0.887097 2
87 (QDeviceGfpRbsPconsTer) (QDeviceAmprRbsPconsTer, QDeviceRbsPconsAmprTer) 55 0.887097 2
31 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcTer) (QDeviceRbsPconsAmprTer, QGenotypePhenotype) 52 0.881356 2
62 (QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTer) 58 0.878788 2
80 (QDeviceRbsPconsAmprTer) (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcTer) 58 0.878788 2
52 (QDeviceGfpRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QGenotypePhenotype) 54 0.870968 2
2 (QGenotypePhenotype, QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTer) 52 0.866667 2
25 (QGenotypePhenotype, QDeviceRbsPconsAmprTer) (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcTer) 52 0.866667 2
77 (QDeviceAmprRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QGenotypePhenotype) 52 0.852459 2
91 (QDeviceAmprRbsPconsTer) (QDeviceRbsPconsAmprTer, QGenotypePhenotype) 52 0.852459 2
19 (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) (QDeviceAmprRbsPconsTer, QDeviceGfpRbsPconsTer) 55 0.846154 2
8 (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTer) (QDeviceRbsPconsAmprTer, QGenotypePhenotype) 52 0.838710 2
59 (QDeviceGfpRbsPconsTer) (QDeviceRbsPconsAmprTer, QGenotypePhenotype) 52 0.838710 2
85 (QDeviceRbsPconsAmprTer) (QDeviceAmprRbsPconsTer, QDeviceGfpRbsPconsTer) 55 0.833333 2
56 (QDeviceGfpRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QGreenFluorescence) 51 0.822581 2
67 (QDeviceGfpRbsPconsTer) (QDeviceRbsPconsFlhdcTer, QAmpicillin) 50 0.806452 2
7 (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) (QDeviceGfpRbsPconsTer, QGenotypePhenotype) 52 0.800000 2
28 (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) (QDeviceAmprRbsPconsTer, QGenotypePhenotype) 52 0.800000 2
100 (QDeviceGfpRbsPconsTer) (QDeviceRbsPconsFlhdcTer) 62 1.000000 1
61 (QDeviceGfpRbsPconsTer, QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer) 58 1.000000 1
79 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer) 58 1.000000 1
71 (QDeviceAmprRbsPconsTer, QDeviceGfpRbsPconsTer) (QDeviceRbsPconsFlhdcTer) 56 1.000000 1
11 (QDeviceAmprRbsPconsTer, QDeviceGfpRbsPconsTer... (QDeviceRbsPconsFlhdcTer) 55 1.000000 1
50 (QDeviceGfpRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsFlhdcTer) 54 1.000000 1
0 (QDeviceRbsPconsAmprTer, QDeviceGfpRbsPconsTer... (QDeviceRbsPconsFlhdcTer) 52 1.000000 1
24 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsAmprTe... (QDeviceRbsPconsFlhdcTer) 52 1.000000 1
32 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcT... (QDeviceRbsPconsAmprTer) 52 1.000000 1
75 (QDeviceAmprRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsFlhdcTer) 52 1.000000 1
92 (QDeviceAmprRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsAmprTer) 52 1.000000 1
48 (QDeviceRbsPconsAmprTer, QAmpicillin) (QDeviceRbsPconsFlhdcTer) 51 1.000000 1
53 (QDeviceGfpRbsPconsTer, QGreenFluorescence) (QDeviceRbsPconsFlhdcTer) 51 1.000000 1

In [11]:
# Select only pairs (rules with antecedent and consequent of size one)
# Sort pairs according to confidence
rulesDataframe["fusedRule"] = rulesDataframe[["antecedants", "consequents"]].apply(lambda x: frozenset().union(*x), axis=1)
rulesDataframe["ruleSize"] = rulesDataframe["fusedRule"].apply(lambda x: len(x))
pairRules = rulesDataframe.sort_values(by=["ruleSize", "confidence", "support"], ascending=[True, False, False])
pairRules.head(30)


Out[11]:
antecedants consequents support confidence consequentSize fusedRule ruleSize
100 (QDeviceGfpRbsPconsTer) (QDeviceRbsPconsFlhdcTer) 62 1.000000 1 (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTer) 2
97 (QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer) 65 0.984848 1 (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprTer) 2
105 (QDeviceAmprRbsPconsTer) (QDeviceRbsPconsFlhdcTer) 59 0.967213 1 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcTer) 2
110 (QDeviceAmprRbsPconsTer) (QDeviceRbsPconsAmprTer) 58 0.950820 1 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsAmprTer) 2
103 (QDeviceGfpRbsPconsTer) (QDeviceRbsPconsAmprTer) 58 0.935484 1 (QDeviceGfpRbsPconsTer, QDeviceRbsPconsAmprTer) 2
112 (QBBFunctionTER) (QGenotypePhenotype) 50 0.925926 1 (QBBFunctionTER, QGenotypePhenotype) 2
107 (QDeviceAmprRbsPconsTer) (QDeviceGfpRbsPconsTer) 56 0.918033 1 (QDeviceAmprRbsPconsTer, QDeviceGfpRbsPconsTer) 2
98 (QDeviceRbsPconsAmprTer) (QGenotypePhenotype) 60 0.909091 1 (QGenotypePhenotype, QDeviceRbsPconsAmprTer) 2
96 (QBioBricksDevicesComposition) (QGenotypePhenotype) 69 0.907895 1 (QBioBricksDevicesComposition, QGenotypePhenot... 2
106 (QDeviceGfpRbsPconsTer) (QDeviceAmprRbsPconsTer) 56 0.903226 1 (QDeviceAmprRbsPconsTer, QDeviceGfpRbsPconsTer) 2
102 (QDeviceRbsPconsAmprTer) (QDeviceGfpRbsPconsTer) 58 0.878788 1 (QDeviceGfpRbsPconsTer, QDeviceRbsPconsAmprTer) 2
109 (QDeviceRbsPconsAmprTer) (QDeviceAmprRbsPconsTer) 58 0.878788 1 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsAmprTer) 2
99 (QDeviceGfpRbsPconsTer) (QGenotypePhenotype) 54 0.870968 1 (QDeviceGfpRbsPconsTer, QGenotypePhenotype) 2
95 (QUnequipDevice) (QGenotypePhenotype) 66 0.868421 1 (QUnequipDevice, QGenotypePhenotype) 2
94 (QDeviceRbsPconsFlhdcTer) (QGenotypePhenotype) 71 0.865854 1 (QDeviceRbsPconsFlhdcTer, QGenotypePhenotype) 2
108 (QDeviceAmprRbsPconsTer) (QGenotypePhenotype) 52 0.852459 1 (QDeviceAmprRbsPconsTer, QGenotypePhenotype) 2
111 (QBBFunctionRBS) (QDeviceRbsPconsFlhdcTer) 50 0.847458 1 (QBBFunctionRBS, QDeviceRbsPconsFlhdcTer) 2
101 (QDeviceGfpRbsPconsTer) (QGreenFluorescence) 51 0.822581 1 (QDeviceGfpRbsPconsTer, QGreenFluorescence) 2
93 (QGreenFluorescence) (QGenotypePhenotype) 69 0.811765 1 (QGreenFluorescence, QGenotypePhenotype) 2
104 (QDeviceGfpRbsPconsTer) (QAmpicillin) 50 0.806452 1 (QDeviceGfpRbsPconsTer, QAmpicillin) 2
61 (QDeviceGfpRbsPconsTer, QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer) 58 1.000000 1 (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTe... 3
79 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsAmprTer) (QDeviceRbsPconsFlhdcTer) 58 1.000000 1 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcT... 3
71 (QDeviceAmprRbsPconsTer, QDeviceGfpRbsPconsTer) (QDeviceRbsPconsFlhdcTer) 56 1.000000 1 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcT... 3
50 (QDeviceGfpRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsFlhdcTer) 54 1.000000 1 (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTe... 3
75 (QDeviceAmprRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsFlhdcTer) 52 1.000000 1 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsFlhdcT... 3
92 (QDeviceAmprRbsPconsTer, QGenotypePhenotype) (QDeviceRbsPconsAmprTer) 52 1.000000 1 (QDeviceAmprRbsPconsTer, QDeviceRbsPconsAmprTe... 3
48 (QDeviceRbsPconsAmprTer, QAmpicillin) (QDeviceRbsPconsFlhdcTer) 51 1.000000 1 (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprT... 3
53 (QDeviceGfpRbsPconsTer, QGreenFluorescence) (QDeviceRbsPconsFlhdcTer) 51 1.000000 1 (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTe... 3
66 (QDeviceGfpRbsPconsTer, QAmpicillin) (QDeviceRbsPconsFlhdcTer) 50 1.000000 1 (QDeviceRbsPconsFlhdcTer, QDeviceGfpRbsPconsTe... 3
42 (QDeviceRbsPconsAmprTer, QGenotypePhenotype) (QDeviceRbsPconsFlhdcTer) 59 0.983333 1 (QDeviceRbsPconsFlhdcTer, QDeviceRbsPconsAmprT... 3

In [12]:
correctedScientific.columns


Out[12]:
Index(['QGenotypePhenotype', 'QBioBricksDevicesComposition', 'QAmpicillin',
       'QBBNamePlasmid', 'QBBFunctionTER', 'QBBNamePromoter',
       'QBBFunctionGameCDS', 'QBBNameTerminator', 'QBBFunctionBiologyCDS',
       'QBBNameRBS', 'QBBExampleCDS', 'QBBNameCDS', 'QBBFunctionPR',
       'QBBFunctionRBS', 'QBBFunctionPlasmid', 'QBBNameOperator',
       'QDeviceRbsPconsFlhdcTer', 'QDevicePconsRbsFlhdcTer',
       'QDevicePbadRbsGfpTer', 'QDevicePbadGfpRbsTer', 'QDeviceGfpRbsPconsTer',
       'QDevicePconsGfpRbsTer', 'QDeviceAmprRbsPconsTer',
       'QDeviceRbsPconsAmprTer', 'QGreenFluorescence', 'QUnequipDevice',
       'QDevicePbadRbsAraTer'],
      dtype='object')

In [13]:
# Sort questions by number of apparition in consequents
for q in scientificQuestions:
    rulesDataframe[q+"c"] = rulesDataframe["consequents"].apply(lambda x: 1 if q in x else 0)
occurenceInConsequents = rulesDataframe.loc[:,scientificQuestions[0]+"c":scientificQuestions[-1]+"c"].sum(axis=0)

occurenceInConsequents.sort_values(inplace=True, ascending=False)
occurenceInConsequents


Out[13]:
QDeviceRbsPconsFlhdcTerc         43
QGenotypePhenotypec              33
QDeviceRbsPconsAmprTerc          30
QDeviceGfpRbsPconsTerc           21
QDeviceAmprRbsPconsTerc          19
QGreenFluorescencec               5
QAmpicillinc                      4
QBBExampleCDSc                    0
QBioBricksDevicesCompositionc     0
QBBNamePlasmidc                   0
QBBFunctionTERc                   0
QBBNamePromoterc                  0
QBBFunctionGameCDSc               0
QBBNameTerminatorc                0
QBBFunctionBiologyCDSc            0
QBBNameRBSc                       0
QDevicePbadRbsAraTerc             0
QBBNameCDSc                       0
QBBFunctionPRc                    0
QUnequipDevicec                   0
QBBFunctionPlasmidc               0
QBBNameOperatorc                  0
QDevicePconsRbsFlhdcTerc          0
QDevicePbadRbsGfpTerc             0
QDevicePbadGfpRbsTerc             0
QDevicePconsGfpRbsTerc            0
QBBFunctionRBSc                   0
dtype: int64

In [14]:
# Sort questions by number of apparition in antecedants
for q in scientificQuestions:
    rulesDataframe[q+"a"] = rulesDataframe["antecedants"].apply(lambda x: 1 if q in x else 0)
occurenceInAntecedants = rulesDataframe.loc[:,scientificQuestions[0]+"a":scientificQuestions[-1]+"a"].sum(axis=0)
occurenceInAntecedants.sort_values(inplace=True, ascending=False)
occurenceInAntecedants


Out[14]:
QDeviceRbsPconsAmprTera          41
QDeviceGfpRbsPconsTera           41
QDeviceRbsPconsFlhdcTera         37
QDeviceAmprRbsPconsTera          33
QGenotypePhenotypea              19
QGreenFluorescencea              10
QAmpicillina                      9
QUnequipDevicea                   4
QBioBricksDevicesCompositiona     1
QBBFunctionTERa                   1
QBBFunctionRBSa                   1
QBBNameOperatora                  0
QBBFunctionPlasmida               0
QDevicePconsRbsFlhdcTera          0
QBBFunctionPRa                    0
QBBNameCDSa                       0
QBBExampleCDSa                    0
QBBNameRBSa                       0
QBBFunctionBiologyCDSa            0
QBBNameTerminatora                0
QBBFunctionGameCDSa               0
QBBNamePromotera                  0
QDevicePbadRbsGfpTera             0
QBBNamePlasmida                   0
QDevicePbadGfpRbsTera             0
QDevicePconsGfpRbsTera            0
QDevicePbadRbsAraTera             0
dtype: int64

In [15]:
sortedPrePostProgression = pd.read_csv("../../data/sortedPrePostProgression.csv")
sortedPrePostProgression.index = sortedPrePostProgression.iloc[:,0]
sortedPrePostProgression = sortedPrePostProgression.drop(sortedPrePostProgression.columns[0], axis = 1)
del sortedPrePostProgression.index.name
sortedPrePostProgression.loc['occ_ant',:] = 0
sortedPrePostProgression.loc['occ_csq',:] = 0
sortedPrePostProgression


Out[15]:
Name: Operator XXX Device: PBAD:RBS:ARA:TER Name: RBS Name: CDS Device: PBAD:RBS:GFP:TER Function - game: CDS Function - biology: CDS Name: PR Function: PR Device: PCONS:GFP:RBS:TER XXX ... Interested in video games Interested in biology Studied biology Play video games Heard about Synthetic biology or BioBricks Volunteered to answer more questions Language Enjoyed playing Played Hero.Coli Temporality
pretest 0.0 4.0 0.0 6.0 2.0 1.0 3.0 4.0 0.0 2.0 ... 77.0 74.0 6.0 74.0 0.0 100.0 6.0 6.0 0.0 0.0
posttest 4.0 15.0 13.0 21.0 20.0 18.0 22.0 25.0 21.0 32.0 ... 77.0 74.0 6.0 74.0 0.0 100.0 6.0 80.0 100.0 100.0
progression 4.0 11.0 13.0 14.0 17.0 17.0 18.0 21.0 21.0 30.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 73.0 100.0 100.0
occ_ant 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
occ_csq 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 43 columns


In [16]:
for questionA, occsA in enumerate(occurenceInAntecedants):
    questionVariableName = occurenceInAntecedants.index[questionA][:-1]
    question = globals()[questionVariableName]
    questionC = questionVariableName + "c"
    sortedPrePostProgression.loc['occ_ant',question] = occsA
    occsC = occurenceInConsequents.loc[questionC]
    sortedPrePostProgression.loc['occ_csq',question] = occsC
    #print(questionVariableName+"='"+question+"'")
    #print("\t"+questionVariableName+"a="+str(occsA)+","+questionC+"="+str(occsC))
    #print()
sortedPrePostProgression.T


Out[16]:
pretest posttest progression occ_ant occ_csq
Name: Operator XXX 0.0 4.0 4.0 0.0 0.0
Device: PBAD:RBS:ARA:TER 4.0 15.0 11.0 0.0 0.0
Name: RBS 0.0 13.0 13.0 0.0 0.0
Name: CDS 6.0 21.0 14.0 0.0 0.0
Device: PBAD:RBS:GFP:TER 2.0 20.0 17.0 0.0 0.0
Function - game: CDS 1.0 18.0 17.0 0.0 0.0
Function - biology: CDS 3.0 22.0 18.0 0.0 0.0
Name: PR 4.0 25.0 21.0 0.0 0.0
Function: PR 0.0 21.0 21.0 0.0 0.0
Device: PCONS:GFP:RBS:TER XXX 2.0 32.0 30.0 0.0 0.0
Example: CDS 0.0 30.0 30.0 0.0 0.0
Name: Plasmid 1.0 37.0 36.0 0.0 0.0
Device: PBAD:GFP:RBS:TER XXX 0.0 37.0 37.0 0.0 0.0
Device: PCONS:RBS:FLHDC:TER 0.0 37.0 37.0 0.0 0.0
Function: RBS 7.0 50.0 42.0 1.0 0.0
Name: TER 11.0 53.0 42.0 0.0 0.0
Ampicillin antibiotic 26.0 70.0 43.0 9.0 4.0
Function: Plasmid 0.0 44.0 44.0 0.0 0.0
Function: TER 2.0 47.0 45.0 1.0 0.0
BioBricks and devices composition 10.0 61.0 51.0 1.0 0.0
Unequip the movement device: effect 8.0 61.0 52.0 4.0 0.0
Device: AMPR:RBS:PCONS:TER XXX 2.0 54.0 52.0 33.0 19.0
Genotype and phenotype 28.0 82.0 53.0 19.0 33.0
Device: GFP:RBS:PCONS:TER XXX 0.0 56.0 56.0 41.0 21.0
Green fluorescence 7.0 65.0 57.0 10.0 5.0
Device: RBS:PCONS:AMPR:TER XXX 0.0 62.0 62.0 41.0 30.0
Device: RBS:PCONS:FLHDC:TER XXX 4.0 70.0 65.0 37.0 43.0
Want to learn more about Engineering 85.0 76.0 -8.0 0.0 0.0
Want to learn more about Synthetic biology 80.0 76.0 -3.0 0.0 0.0
Want to learn more about Video games 84.0 81.0 -3.0 0.0 0.0
Want to learn more about Biology 85.0 83.0 -2.0 0.0 0.0
Age 32.0 32.0 0.0 0.0 0.0
Gender 26.0 26.0 0.0 0.0 0.0
Interested in video games 77.0 77.0 0.0 0.0 0.0
Interested in biology 74.0 74.0 0.0 0.0 0.0
Studied biology 6.0 6.0 0.0 0.0 0.0
Play video games 74.0 74.0 0.0 0.0 0.0
Heard about Synthetic biology or BioBricks 0.0 0.0 0.0 0.0 0.0
Volunteered to answer more questions 100.0 100.0 0.0 0.0 0.0
Language 6.0 6.0 0.0 0.0 0.0
Enjoyed playing 6.0 80.0 73.0 0.0 0.0
Played Hero.Coli 0.0 100.0 100.0 0.0 0.0
Temporality 0.0 100.0 100.0 0.0 0.0

In [ ]: